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Searching  and  Scanning:  a  Review  of  Lawrence  W.  Stark's  Vision  Models 


1.0  Executive  Summary 

This  report  provides  a  brief  summary  of  the  theories  and  research  of  Dr.  Lawrence  W.  Stark  and 
his  associates  related  to  searching  for  and  scanning  objects  in  the  visual  field.  The  primary  goal  of  this 
study  has  been  to  assess  the  applicability  of  the  theories  to  U.S.  Army  search  and  target  acquisition 
problems.  Information  that  can  be  applied  to  the  modeling  of  visual  search  for  personnel,  groimd 
vehicles,  and  helicopters  in  cluttered  terrain  is  of  particular  interest  here. 

The  Stark  model  refers  to  search  as  a  process  involving  active  eye  movements  that  cover  a  scene, 
with  the  goal  of  locating  a  specific  kind  of  object.  A  search  pattern  is  the  sequence  of  eye  movements 
used  to  cover  a  scene  that  includes  an  apparently  random  distribution  of  objects  that  could  appear 
anywhere  in  the  field  of  view.  A  search  path  is  a  similar  process,  but  is  used  to  cover  natural  scenes  in 
which  objects  have  expected  locations  (e.g.,  tanks  on  the  ground,  ships  in  the  water). 

Scanning  is  systematic  inspection  of  an  object  after  it  has  been  located,  to  compare  its  features 
with  those  of  stored  cognitive  models  and  to  complete  the  recognition  and  identification  process.  A 
scanpath  is  used  to  inspect  familiar  objects  during  normal  viewing  of  scenes,  people,  and  objects, 
Scanpath  theory  suggests  that  eye  movements  are  controlled  by  internal  cogtiitive  models  and  predicts 
similar  sequences  of  visual  fixations  for  a  given  observer  looking  at  a  particular  image, 

Stark’s  research  has  demonstrated  that  glimpses  or  fixations  are  not  independent.  Instead,  the 
location  of  one  glimpse  is  linked  with  that  of  the  next.  This  result  suggests  that  using  mathematical 
models  which  assume  independent  events  may  not  be  appropriate,  at  least  for  cluttered  scenes.  Instead, 
Stark  proposes  using  Markov  models,  which  assume  that  the  next  event  depends  on  the  directly  previous 
event.  Stark  has  demonstrated  the  use  of  such  models  in  predicting  eye  movements  during  the  scanning 
process. 

Simple  algorithmic  models  have  been  proposed  by  Stark  for  both  the  search  and  the  scanning 
processes,  as  discussed  in  this  report.  A  comprehensive  top-level  theoretical  model  of  visual  search  has 
been  developed,  incorporating  the  simpler  models.  This  theoretical  model  has  been  used  to  prepare  a 
very  simple  prototype  computer  program  that  predicts  the  number  of  targets  detected,  number  of  false 
alarms,  and  total  search  time,  for  a  user-defined  scenario.  While  the  program  includes  too  many 
simplifications  and  approximations  to  be  useful  at  present,  it  demonstrates  the  feasibility  of  developing 
more  comprehensive  programs  based  on  the  theoretical  model,  if  necessary  experimental  data  can  be 
obtained. 

Stark’s  research  and  models  have  been  reviewed  for  this  study  and  the  useful  and  most  promising 
components  identified  (Section  6.0).  Discrepancies  and  model  omissions  also  are  noted  so  efforts  can 
be  made  to  improve  the  model  if  desired  and  to  increase  its  usefulness  for  U.S.  Army  search  and  target 
acquisition  modeling. 


1 


Searching  and  Scanning:  a  Review  of  Law'rence  W.  Stark’s  Vision  Models 


2.0  Introduction 


2.1  Background 

Lawrence  W .  Stark  has  been  a  professor  at  the  University  of  Califbmiaj  Berkeley,  for  many 
years.  He  divides  his  teaching  efforts  there  among  various  engineering,  biology,  and  medicine 
departments,  including  the  Physiological  Optics  and  Neurology  Departments.  Stark  pioneered  the 
application  of  control  and  information  theories  to  neurological  systems.  His  current  research  interests 
relate  to  bioengineering,  with  emphasis  on  human  and  robotic  control  of  movement  and  vision. 

Stark  and  his  coworkers  have  published  articles  recently  in  Optometry  and  Vision  Science,  IEEE 
Transactions  on  Systems,  Man,  and  Cybernetics,  IEEE  Transactions  on  Robotics  and  Automation, 
The  Journal  of  the  Institute  of  Television  Engineers  of Japan,  Experimental  Brain  Research,  Annals 
of  Biomedical  Engineering,  and  Vision  Research.  He  has  contributed  chapters  to  numerous  books, 
including  Visual  Search:  II.  ^ 

For  individuals  interested  in  research  related  to  target  acquisition.  Stark  is  noted  especially  for  his 
extensive  work  on  the  human  search  process.  His  1971  article  in  Scientific  American,  Eye  Movements 
and  Visual  Perception  f  provided  both  a  theoretical  and  a  practical  basis  for  follow-on  work  critical  to 
understanding  how  humans  inspect  and  recognize  objects.  More  recent  work  on  search  patterns  and 
search  paths  has  added  significantly  to  models  of  human  visual  processes,  as  these  relate  to  target 
detection  and  identification. 

2.2  Purpose  of  Report 

The  primary  purpose  of  this  report  is  to  assess  the  applicability  of  Stark’s  theories  to  U.S.  Army 
search  and  target  acquisition  conditions  and  situations,  for  purposes  of  modeling  and  prediction.  The 
goals  are  to  determine  (1)  whether  the  theory  shows  promise,  (2)  how  close  it  is  to  being  useful,  (3) 
what  discrepancies  or  omissions  remain  in  the  theory,  and  (4)  how  (if  possible)  any  discrepancies  or 
omissions  can  be  addressed. 

The  following  sections  provide  a  brief  summary  of  some  of  Stark's  work  related  to  searching  for 
objects  in  the  visual  field  and  scanning  those  objects.  The  objective  of  this  summary  is  to  make  this 
information  accessible  to  target  acquisition  researchers  who  may  not  have  Stark's  original  articles 
available,  or  who  need  to  know  the  gist  of  this  information  but  do  not  require  the  full  details  included  in 
the  published  literature.  Terms  and  their  definitions  included  in  this  report  are  based  on  usage  in  the 
available  Stark  literature;  as  a  result,  some  terms  and  definitions  may  differ  from  those  utilized  by  other 
vision  and  target  acquisition  researchers. 

This  brief  report  does  not  provide  a  comprehensive  theory  of  search,  but  instead  emphasizes  the 
defimtions,  information,  and  model  components  related  to  acquisition  of  ground  targets  by  humans 
using  direct  vision,  image  magnification  systems  such  as  binoculars,  and  electro-optical  sensors  such 
as  television  systems.  Thus  our  primary  interest  is  in  Stark's  work  that  can  be  applied  to  the  modeling 
of  visual  search  for  personnel,  ground  vehicles,  and  helicopters  in  cluttered  terrain. 


^  Brogan,  D.,  and  Carr,  K.,  eds.  Visual  Search:  II,  Taylor  and  Francis,  London,  1992. 

2  Noton,  D.,  and  Stark,  L,  Eye  Movements  and  Visual  Search,  in  Scientific  American,  vol.  224  no.  6  pp 
34-43,  June  1991. 
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3.0  Visual  Search 


3.1  Searching  Versus  Scanning 

Eye  movements  are  necessary  for  locating  and  identifying  objects,  since  detailed  visual 
information  can  be  obtained  only  through  the  fovea.^  Unless  the  object  of  interest  has  an  angular 
subtense  of  only  1  to  2  arc  degrees,  the  eye  must  move  in  order  to  fixate  on  it  and  inspect  it  in  detail . 
For  example,  if  a  scene  subtends  a  horizontal  angle  of  20  degrees  at  the  eye,  the  observer  must  move  his 
or  her  eyes  and  sequentially  look  around  the  scene  at  the  parts  regarded  as  features.  Features  are 
tentatively  located  by  peripheral  vision,  then  fixated  directly  for  detailed  inspection. 

As  the  term  is  used  by  Stark,  search  is  a  process  involving  active  eye  movements  that  cover  a 
scene  (the  field  of  regard),  with  the  goal  of  locating  a  specific  kind  of  object  (referred  to  as  a  target). 
The  search  process  ends  when  an  object  of  interest  is  located  and  detected. 

Once  the  object  is  detected,  Stark  considers  that  the  scanning  process  begins.  Scanning  allows  the 
observer  systematically  to  inspect  the  object  and  to  compare  its  features  with  those  of  stored  cognitive 
models,  in  order  to  complete  the  recognition  and  identification  processes.  Identification  is  considered 
more  difficult  than  recognition,  and  both  are  more  difiScult  than  detection. 


If  a  scene  includes  clutter  (random  objects  that  possibly  could  be  confused  v^th  the  targets),  the 
search  process  may  be  more  complicated,  as  is  illustrated  in  Figure  1.  While  searching,  the  observer 
may  have  to  compare  objects  that  could  be  either  a  target  or  clutter  with  at  least  a  top-level  cognitive 
model  of  the  object  of  interest,  for  detection  to  occur  prior  to  beginning  the  recognition  and 
identification  process.  Thus  the  distinction  between  searching  and  scanning  may  blur  under  cluttered 
conditions. 


Figure  1.  Search  and  Scan  Processes  With  and  Without  Clutter  Present, 


^  See  the  Glossary  at  the  end  of  this  report  for  definitions  of  terms  as  used  in  Stark’s  reports.  Note  that 
Stark’s  terminology  differs  in  some  instances  from  that  used  by  other  search  and  target  acquisition 
researchers. 
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3.2  The  Search  Process 

Search  can  be  defined  as  a  process  that  involves  active  eye  movements  over  the  field  of  regard, 
with  the  goal  of  locating  and  recognizing  a  desired  object,  referred  to  as  a  target.  Search  ends  when  an 
object  of  interest  is  detected  and  can  be  inspected  to  determine  whether  it  is  a  target.  The  search 
process  then  may  be  followed  by  two  other  processes  or  events;  recognition  and  identification  of  a 
target. 

Various  search  behaviors  and  strategies  are  observed  in  humans,  depending  on  the  conditions  of 
the  search  process.  In  particular,  visual  search  for  random  targets  in  an  “unorganized”  two-dimensional 
(2-D)  display  is  conducted  differently  than  is  search  for  "natural"  targets,  especially  those  imbedded  in 
a  three-dimensional  (3-D)  "natural"  scene.  The  sequences  of  eye  movements  used  to  cover  an 
unorganized  search  area  are  referred  to  by  Stark  as  search  patterns.  Eye  movement  sequences  used  to 
examine  a  natural  scene  are  called  search  paths. ^ 

Demonstration  that  humans  consistently  use  search  patterns  and  search  paths  puts  into  doubt  a 
basic  premise  of  Koopman's  classic  detection  theory,  which  assumes  that  each  glimpse  or  fixation  is 
independent  of  the  others.^  While  independence  of  glimpses  may  hold  for  searching  for  ships  at  sea 
(where  some  noise  is  present,  but  no  clutter),  there  appears  to  be  a  significant  cognitive  component  to 
the  search  process  in  terrain  situations.  This  cogmtive  component  results  in  linking  the  location  of  one 
glimpse  with  that  of  the  next,  so  that  the  focus  of  each  fixation  depends  on  that  of  the  previous  fixation. 

3.3  Search  Patterns 

Search  patterns  are  considered  to  be  efficient,  repetitive,  idiosyncratic  sequences  of  eye 
movements  during  which  the  eyes  systematically  cover  an  entire  2-D  scene.  These  eye  sequences  occur 
during  observation  of  an  apparently  random  distribution  of  objects  in  a  search  area,  when  the  observer 
is  not  familiar  with  the  spatial  organization  (if  any)  of  the  objects.  Examples  include  searching  from  the 
air  for  individual  ships  on  the  open  ocean  or  trying  to  find  specific  letters  randomly  embedded  in  a 
uniform  background  (typical  of  much  laboratory  target  acquisition  research).  Search  patterns  appear  to 
be  modes  of  covering  a  search  area  in  an  efficient  and  thorough  manner  when  no  information  is 
available  to  shape  the  search.  Patterns  usually  develop  naturally,  without  instructions  or  training. 
However,  practice  does  improve  performance. 

Search  patterns  will  vary  as  a  function  of  various  factors,  including  the  instantaneous  field  of  view 
of  the  search  area  ("window"  size),  which  is  a  simulation  of  the  width  of  the  human  retina  as  peripheral 
vision  might  be  restricted  wliile  using  an  optical  device.  A  fine-grained  search  pattern  is  used  as 
observers  move  small  w'indows  over  the  scene,  and  a  coarser  one  is  observed  with  larger  windows  (see 
Figure  2).  Thus  observers  approximate  what  would  be  considered  optimal  visual  coverage  of  a  search 
area. 


^  Stark,  L.  W.,  and  others.  Visual  Search  in  Virtual  Environments,  in  Proceedings  ofSPlE  Conference  on 
Human  Vision,  Visual  Processing,  and  Digital  Display,  San  Jose,  CA,  February  1992. 

5  Stark,  L.,  and  others.  Keynote  Lecture;  Search  Patterns  and  Search  Paths  in  Human  Visual  Search,  in 
Visual  Search:  11,  Taylor  and  Francis,  London,  1992. 

^  Office  of  the  Chief  of  Naval  Operations.  Search  and  Screening,  by  B.O.  Koopman.  Navy  Dept. 
Washington,  DC,  1946. 
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Search  patterns  used  for  small  "windows"  on  a  scene  (e.g.,  12  arc  degrees  wide  by  1 L5  arc 
degrees  high)  usually  are  regular  and  may  be  described  as  horizontal,  vertical,  oscillating,  or  circular. 
Examples  are  provided  in  Figure  3.  Horizontal  patterns  are  observed  most  jfrequently.  If  horizontal 
"guidelines"  are  added  to  the  scene,  observers  tend  to  use  these  to  structure  the  search,  and  target 
location  performance  (speed  and  accuracy)  is  improved  (see  Figure  3  (f)). 

As  the  window  onto  a  scene  grows  larger  (e.g.,  24  arc  degrees  wide  by  23  arc  degrees  wide), 
irregular  (essentially  random)  window-movement  patterns  often  replace  the  more  regular  patterns 
observed  vdth  smaller  windows  (see  Figure  2  (d)).  This  may  be  due  to  initial  peripheral  viewing  of  the 
edges  of  the  display  to  pick  up  potential  targets,  followed  by  foveal  fixations  to  examine  these  objects 
more  closely.  Presentation  of  a  large  number  of  targets  in  a  cluttered  scene  also  results  in  irregular 
search  patterns. 


Figure  2.  Examples  of  the  Effect  of  Window  Size,  (a)  With  large  windows,  coarse  window-movement 
patterns  are  observed,  (b)  Fine  patterns  are  seen  with  small  windows,  (c)  and  (d)  Irregular  patterns 
become  more  common  as  window  size  increases.  [See  Footnote  4] 


5 


Searching  and  Scanning:  a  Review  of  Lawrence  W,  Stark’s  Vision  Models 


Figure  3.  Examples  of  Observed  Regular  Search  Patterns.  The  patterns  are  referred  to  as  (a)  horizontal, 
(b)  vertical,  (c)  and  (d)  oscillating,  and  (e)  circular,  (f)  Addition  of  horizontal  guidelines  improves  target 
location  performance.  [See  Footnote  5] 


3.4  Search  Paths 

Recall  tliat  search  patterns  are  eye  movement  sequences  used  to  cover  an  apparently  random 
distribution  of  objects.  Search  path  is  Stark’s  term  for  a  sequences  of  eye  movements  tliat  also  is 
efficient,  repetitive,  idios3mcratic,  but  that  is  used  to  cover  natural  (real  world)  scenes.  Search  paths 
utilize  only  a  small  set  of  fixations  to  check  a  limited  number  of  features  in  the  scene  —  generally  the 
targets.  This  sequence  occurs  during  observation  of  natural  scenes  with  a  small  number  (7  ±  2)  of 
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naturally-distributed  targets,  that  is,  targets  located  where  they  might  be  expected  in  the  real  world. 

This  strategy  is  used  when  the  background  should  suggest  reasonable  and  easily-remembered  places  for 
targets  to  be  placed  in  sensible  fashion.  Even  when  the  actual  image  is  shown  in  a  2-D  display,  the 
"'mind's  eye"  apparently  reconstructs  the  corresponding  3-D  format,  based  on  prior  experience. 

Search  paths  tend  to  be  unique  to  the  observed  scene.  They  are  shaped  and  driven  by  spatial 
models  of  the  spatial  organization  of  target  objects  in  a  normal  3-D  scene,  and  by  knowledge 
accumulated  from  previous  experience.  Example  search  paths  are  shown  in  Figure  4.  The  three  paths 
represent  the  same  observer’s  search  path  of  the  same  scene  on  three  separate  occasions.  Notice  that 
even  over  time  an  individual’s  search  idiosyncrasies  remain  constant,  for  a  given  scene. 

Most  observers  sweep  out  the  path  from  left  to  right.  Observers  vary  widely  in  speed  of  search 
and  in  the  length  of  the  search  path  but,  as  noted,  a  given  observer's  search  path  is  similar  for  successive 
runs  when  the  scene  configuration  remains  constant. 


(C)  (d) 


Figure  4.  Examples  of  Observed  Search  Paths,  (a)  The  naturalistic  scene  is  shown,  with  the  observer’s 
search  path  superimposed.  Trucks  were  the  target  objects;  vans  and  cars  were  clutter  objects,  (b),  (c), 
and  (d)  Three  other  examples  are  shown  of  the  same  observer's  search  paths  while  viewing  the  same 
scene.  Boxes  indicate  target  detection.  Searching  along  the  paths  was  at  a  rate  of  approximately  5  dots 
per  second.  [See  Footnote  5] 
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3.5  Concepts  Useful  for  Modeling  Search 

Targets  usually  are  man-made  objects  in  the  real  world  whose  images  are  often  made  up  of  vertical 
and  horizontal  lines.  Processing  the  scene  to  remove  everythmg  except  such  straight  lines  is  used  in 
automatic  target  recogmtion,  to  help  differentiate  targets  from  background  (see  the  example  in  Figure 
5).  Stark  suggests  that  this  techmque  might  serve  to  model  the  preattentive  detection  of  targets  in  a 
distracting  background.  However,  researchers  working  in  modeling  of  low-observable  objects  suggest 
that  reduction  of  a  scene  to  horizontal  and  vertical  lines  may  be  necessary  but  not  sufficient  for 
detection.  The  existence  of  angles  and  repetitive  patterns  within  the  target  are  equally  important.^ 


Figure  5.  Examples  of  the  Effect  of  Removing  All  Curved  Lines  from  Scene,  (a)  Original  scene,  with 
example  of  target  at  top.  (b)  Same  scene  with  everything  except  horizontal  lines  removed,  (c)  Same 
scene  with  everything  except  vertical  lines  removed,  (d)  Same  scene  with  only  horizontal  and  vertical  lines 
remaining.  The  target  (boxed)  can  be  discriminated  from  non-targets  by  Its  lack  of  the  horizontal  weapon 
barrel.  [Personal  communication  from  L.W.  Stark] 


Doll,  T.J.,  McWhorter,  S.W^.,  and  Schnuedcr,  D.E.  Coinputcitioncil h/fodcl  ofHmnQvi  Visuol  Scorch  and 
Detection.  Georgia  Institute  of  Technology,  March  1994. 
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Visual  search  also  might  be  modeled  as  a  series  of  searches  by  a  matched filter  (possibly  a 
template  with  the  brightness  characteristics  and/or  the  outlines  of  the  target)  scanning  a  graphical 
artificial  scene  on  a  computer  to  pick  out  the  targets  from  the  background.  Search  time  and  number  of 
errors  (commission  and  omission)  can  be  used  as  measures  of  filter  performance. 

Detection,  recognition,  and  identification  can  be  considered  a  sequence  of  tasks  or  processes,  each 
more  diflBcult  than  the  other.  Based  on  this  definition,  a  hierarchy  of  search  can  be  set  up.  For 
example,  the  filter  at  first  can  search  crudely  for  probable  targets,  using  as  criterion  that  25  pixels  per 
target  must  match  the  template.  The  detected  probable  targets  then  can  be  compared  with  the  template 
more  closely,  requiring  a  match  of  625  pixels  per  target.  This  approach  yields  major  reductions  both  in 
errors  and  in  required  search  time,  when  compared  with  a  single-step  filter  process. 

When  the  goal  is  to  model  realistic  human  search  times  and  accuracy,  scene  clutter  cannot  be 
simulated  simply  by  using  Gaussian  noise  to  "obscure"  a  scene  and  its  embedded  targets.  Instead, 
clutter  must  be  modeled  as  unwanted  signals  that  are  similar  to  the  signals  of  interest  (the  targets)  and 
so  could  be  confused  with  them. 

Images  of  scenes  and  of  real  targets  can  be  captured  using  video  cameras.  These  scenes  then  can 
be  modeled  as  simple  black  and  white  images  to  minimize  computing  time.  Addition  of  gray  scale 
values  and  the  degrading  of  image  resolution  would  match  the  human's  visual  characteristics  more 
accurately,  if  the  images  are  better  than  unaided  vision  by  the  eyeball. 

To  reduce  computing  time  and  add  robustness,  regions  of  interest  in  the  scene  can  be  selected 
serially  for  processing.  A  detection  threshold  then  can  be  set  for  a  given  region,  to  specify  the  degree  of 
similarity  between  template  and  target  required  for  a  match.  However,  defining  ‘"regions  of  interest”  is 
no  trivial  process,  nor  is  setting  a  realistic  threshold  value,  especially  on  a  region-by-region  basis. 

3.6  Simple  Search  Model 

A  schematic  representation  of  a  simple  search  model  based  on  the  above  concepts  from  various 
Stark  reports  is  shown  in  Figure  6.  The  model  includes  the  following  processes. 

•  Images  of  a  scene  and  targets  are  captured  and  clutter  is  incorporated. 

•  The  scene  is  processed  to  enhance  possible  targets  by  removing  everything  except  straight  lines. 

•  The  scene  is  modified  to  mimic  human  visual  resolution. 

•  Regions  of  interest  are  defined,  and  thresholds  are  set  to  simulate  target  detection  and  recognition 
criteria. 

•  Matched  filters  are  used  to  compare  the  images  of  the  regions  of  interest  with  predefined  criteria  for 
targets  and  to  select  or  reject  objects  in  those  regions  as  targets. 
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Figure  6.  Schematic  Representation  of  Stark's  Visual  Search  Model. 
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4.0  Visual  Scan 


4.1  The  Scanning  Process 

In  the  Stark  view  of  the  search  process,  scanning  is  considered  to  be  a  serial  process  involving 
active,  regular  eye  movements  over  an  already-detected  target,  with  the  goal  of  recognizing  and 
possibly  identifying  what  is  being  examined.  The  observer  inspects  the  object  and  compares  its  features 
with  those  of  stored  internal  representations  (cognitive  models).  Scanning  usually  ends  when  the 
recognition  or  identification  process  is  complete. 

Visual  recognition  involves  storing  and  retrieving  memories.  Nerve  cells  in  the  brain's  visual 
cortex  are  activated  and  an  image  of  the  object  being  viewed  is  formed  in  the  "mind's  eye."  The 
human's  memory  system  must  contain  an  internal  representation  of  every  object  that  is  to  be  recognized. 
Recognition  of  an  object  when  it  is  encountered  (after  being  observed  previously)  is  the  process  of 
matching  it  with  its  internal  representation  in  the  memory  system. 

It  is  likely  that  an  object's  internal  representation  is  a  piecemeal  affair,  as  assemblage  of  parts  or 
features.  The  serial  recognition  hypothesis  states  that,  during  recognition,  the  features  of  the  internal 
representation  are  matched  serially  with  the  features  of  the  actual  object,  step  by  step.  Successful 
matching  of  all  features  completes  recognition.  Figure  7  illustrates  this  serial  process.^ 

The  average  observer  takes  longer  to  recognize  a  target  object  tlian  is  needed  to  reject  a  nontarget 
object.  Whenever  a  nontarget  object  fails  to  match  some  feature  of  the  internal  representation,  tliat 
object  can  be  rejected  without  further  scrutiny,  whereas  target  objects  must  be  checked  on  all  features. 
Observers  also  take  longer  to  recognize  complex  target  objects  than  to  recognize  simple  ones,  since 
more  features  must  be  checked  in  a  complex  object.  On  the  other  hand,  the  internal  representations  of 
well-known,  very  simple  objects  appear  to  be  holistic,  so  that  recognition  is  a  rapid,  parallel  process. 

When  viewing  simple  pictures  such  as  line  drawings,  fixations  tend  to  cluster  around  unpredictable 
or  unusual  details,  particularly  unpredictable  contours  such  as  angles.  Tlius  sharp  curves  probably  are 
important  features  for  visual  identification,  and  angles  may  be  the  principal  features  the  brain  uses  to 
store  and  to  recognize  drawings  (see  Figure  8).  This  is  consistent  v^th  research  that  has  demonstrated 
the  presence  of  angle-detecting  neurons  in  various  animals. 

Use  of  angles  for  image  storage  makes  sense  jfrom  a  space  optimization  (data  compression)  point 
of  view.  If  an  object  is  divided  into  connected  straight  segments,  a  segment's  length  and  the  angle  that 
connects  it  to  the  next  segment  can  be  stored,  rather  than  storing  the  entire  object's  construction.  This  is 
analogous  to  storage  systems  for  large  matrices,  where  positions  of  non-zero  elements  and  their  values 
are  stored  instead  of  storing  perhaps  10,000  elements  of  which  only  1  per  cent  are  nonzero.^ 


^  It  should  be  noted  that  some  other  vision  researchers  consider  recognition  to  be  less  sequential;  they 
hypothesize  instead  that  numerous  object  parameters  are  considered  simultaneously  during  the  procedure. 

^  Hacisalihzade,  S.S.,  Stark,  L.W.,  and  Allen,  J.S.  Visual  Perception  and  Sequences  of  Eye  Movement 
Fixations:  a  Stochastic  Modeling  Approach,  in  IEEE  Transactions  on  Systems,  Man,  and  Cybernetics, 
vol  22,  no.  3,  May/June  1992. 
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Figure  7.  Eye  Movement  Regularities  While  Reviewing  an  Image,  (a)  Bust  of  Queen  Nefertiti  viewed  by 
the  observer,  (b)  Eye  movements  recorded  by  A.L.  Yarbus,  Institute  for  Problems  of  Information 
Transmission,  Moscow.  The  eyes  seem  to  visit  the  features  of  the  head  cyclically,  following  fairly  regular 
pathways,  rather  than  crisscrossing  the  picture  at  random. 

4.2  Scanpaths 

Scanpath  is  Stark  s  term  for  repetitive  and  idiosyncratic  sequences  of  eye  movements  for 
inspecting  and  recognizing  particular  familiar  objects  during  normal  viewing  of  scenes,  people,  and 
objects.  It  tlius  relates  to  tlie  search  path  process,  which  also  is  used  for  familiar  types  of  scenes.  Tlie 
scanpath  is  shaped  by  a  cognitive  model  of  tlie  guessed  object  and  the  organization  of  its  subfeatures. 
This  cogmtive  model  controls  active  looking  to  result  in  efficient  acquisition  of  visual  information. 
Scanpath  sequences  occupy  from  about  25  to  30  per  cent  of  the  observer's  viewing  time,  tlie  rest 
consisting  of  less  regular  eye  movements.  On  occasion,  scanpaths  are  not  evident  when  some  simple, 
very  familiar  objects  are  viewed. 

Noton,  D.,  and  Stark,  L,  Eye  Movements  and  Visual  Search,  in  Scientific  American,  vol.  224  no.  6  pp 
34-43,  June  1991. 


Searching  and  Scanning:  a  Review  of  Lawrence  W.  Stark’s  Vision  Models 


Figure  8.  Example  of  Importance  of  Angles  in  Recognition.  Fred  Attneave  Ml,  University  of  Oregon, 
selected  the  38  points  of  greatest  curvature  in  a  picture  of  a  sleeping  cat  and  joined  them  with  straight 
lines,  eliminating  all  other  curves.  The  result  is  still  easily  recognizable.  [See  Footnote  10] 


The  purpose  of  scanning  is  to  identify  the  target.  The  order  of  fixations  in  a  scanpatli  is  by  no 
means  random.  Tlie  lines  representing  tlie  saccades  form  broad  bands  from  point  to  point.  They  do  not 
crisscross  the  picture  at  random  as  would  be  expected  if  the  eyes  visited  different  features  repetitively  in 
a  random  order.  The  overall  record  indicates  a  series  of  cycles.  In  each  cycle,  the  eyes  visit  the  main 
features  of  the  picture,  following  rather  regular  pathways  fi-om  feature  to  feature. 

Scanpatlis  tend  to  be  unique  to  die  individual  and  to  the  observed  scene  or  object.  That  is,  each 
observer  usually  has  the  same  scanpath  when  viewing  the  same  picture,  but  a  different  one  for  a 
different  picture  (thus  a  scanpath  is  not  the  result  of  some  fixed  habit  of  eye  movement).  Two  observers 
viewing  die  same  picture  will  have  different  scanpadis,  indicating  that  these  paths  do  not  simply  result 
from  peripheral  feature  detectors  that  control  eye  movements  for  all  observers. 

Visual  scanpadis  are  not  observed  for  objects  diat  are  small  enougli  to  be  viewed  in  a  single 
fixation.  However,  observers  fixating  on  such  objects  report  that  their  attention  is  sliifted  from  one  area 
of  the  object  to  another  over  time,  even  though  eye  movements  are  not  required.  Stark  postulates  that, 
for  small  objects,  a  sequence  of  internal  shifts  of  attention  may  replace  eye  movements.  Features  are 
processed  serially  and  die  scanpath  is  followed  as  dictated  by  the  features  of  interest.  Thus  each  motor 
memoiy  trace  in  die  sequence  of  features  records  a  sliift  of  attention  that  can  be  executed  eidier 
externally  as  a  eye  movement  or  internally  as  an  attention  shift. 
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4.3  Scanpath  Theory  and  Models 

Given  that  the  target  already  has  been  detected,  Stark’s  scanpath  theory  suggests  that  eye 
movements  are  controlled  by  internal  cognitive  models  already  present  in  the  brain,  and  predicts  similar 
sequences  of  \'isual  fixations  for  a  given  observer  looking  repeatedly  at  the  image  of  tliis  particular 
target.  It  is  proposed  that,  in  the  internal  representation  or  memory  of  the  picture,  the  features  are  linked 
together  in  sequence  by  the  memory  of  the  eye  movements  required  to  look  from  one  feature  to  the  next. 
Thus  the  eyes  tend  to  move  from  feature  to  feature  in  a  fixed  order,  as  they  scan  the  picture.  *  h  12 
However,  for  realism,  some  randomness  usually  is  included  in  modeling  the  generation  of  scanpaths. 

A  scanpath  feature  ring  has  been  proposed  as  a  serial  model  of  the  human's  internal  representation 
of  objects.  The  model  maintains  that  representations  of  objects  are  composed  of  sensory  memory  traces 
recording  object  features  and  of  motor  memory  traces  of  the  eye  movements  from  one  feature  to  another. 

In  the  feature  ring  model,  as  an  observer  views  an  object  for  the  first  time  and  becomes  familiar 
with  it,  he  or  she  alternately  records  a  feature  of  the  object  and  the  eye  movement  required  to  reach  the 
next  feature.  The  memory  traces  of  the  feature  ring  thus  are  laid  down,  as  both  sensory  and  motor 
activities  are  recorded.  The  feature  ring  establishes  a  fixed  ordering  of  features  and  eye  movements, 
corresponding  to  the  scanpath  on  the  object.  When  the  object  is  next  encountered,  the  observer 
recognizes  it  by  matching  it  with  the  feature  ring,  which  is  the  object’s  internal  representation  in 
memory.  Matching  consists  of  verifying  successive  features  and  cany'ing  out  the  intervening  eye 
movements,  as  directed  by  the  feature  ring. 

4.3.1  Markov  Models  of  Scanpaths 

Scanpath  theory  predicts  similar  sequences  of  visual  fixations  for  an  observer  looking  at  a 
particular  image.  The  degree  of  sinulanfy  usually  is  determined  by  visual  inspection  of  the  fixation 
sequences.  However,  it  has  been  proposed  that  the  sequence  of  eye  fixations  of  a  given  scanpath  can  be 
modeled  as  a  Markov  matrix.  Two  such  matrices  for  two  scanpaths  then  can  be  compared  by 
subtraction,  to  determine  the  error  or  statistical  discordance  matrix  between  the  two.  The  result  tlien  can 
be  converted  to  a  scalar  measure  of  the  statistical  discordance  to  obtain  a  numerical  value  for  scanpath 
similarity. 


Stark,  L.W..  Top-down  Vision  in  Humans  and  Robots,  in  Proceedings  ofSPIE  Conference  on  Human 
Vision,  Visual  Processing,  and  Digital  Display,  San  Jose,  CA,  Februarj’  1993. 

Stark,  L.W.  New  Quantitative  Evidence  for  the  Scanpath  Theory:  Top-Down  Vision  in  Humans  and 
Robots,  in  Proceedings  of  the  First  Meeting  of  the  International  Society  of  Theoretical  Neurobiology, 
Milano,  January  1993. 

Hacisalihzade,  S.S.,  Stark,  L.W.,  and  Allen,  J.S.  Visual  Perception  and  Sequences  of  Eye  Movement 
Fixations:  a  Stochastic  Modeling  Approach,  in  IEEE  Transactions  on  Systems,  Man,  and  Cybernetics, 
vol  22,  no.  3,  p.  474,  May/June  1992. 
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The  following  example  of  this  process  is  based  on  Figure  9,  which  shows  the  eye  movements  made 
by  an  observer  viewing  for  the  first  time  a  drawing  adapted  from  Paul  Klee's  Old  Man  Figuring.  The 
image  can  be  divided  into  seven  regions  of  interest,  the  hand  (A),  mouth  (B),  nose  (C),  left  eye  (D), 
right  eye  (E),  neck  (F),  and  ear  (G).  Call  these  regions  states  in  which  the  fixations  must  be  located  and 
postulate  tliat  transitions  from  one  state  to  another  have  certain  probabilities.  The  result  can  be 
described  as  a  Markov  process. 


Figure  9.  Eye  Movements  While  Viewing  Klee's  Old  Man  Figuring.  The  letters  (A  through  G)  and 
corresponding  circles  Indicate  the  seven  regions  of  interest  about  which  fixations  are  clustered.  [See 
Footnote  1 0] 


The  sequence  of  fixations  in  Figure  9  is  BBFFAAAABCEDCG.  That  is,  the  observer  fixates  first 
on  B,  looks  at  B  again,  then  moves  to  F.  After  anotlier  glance  at  F  and  four  glances  at  A,  fixation 
returns  again  to  B,  then  to  C.  Therefore,  the  probability  that  the  fixation  state  will  transition  from  B  to 
B  is  0.33,  from  B  to  F  is  0.33,  and  from  B  to  C  is  0.33.  The  total  sequence  (and  numerous  otlier 
fixation  sequences  that  fit  this  probability  distribution)  can  be  generated  from  the  following  matrix. 
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If  the  observer  had 

scanned  the  areas  of  the  drawing  in  a  different  order,  a  different  matrix  could 

be  used  to  describe  it.  For  instance,  the  order  BFFAAAABFABBCEDCEDCGGGG  would  yield  the 

following  matrix. 
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For  modeling  the  scanning  process,  it  is  useful  to  determine  how  closely  two  scanpaths  match.  The 
error  or  statistical  discordance  can  be  computed  as 

E  =  M^-M2, 

that  is,  as  the  difference  between  the  two  matrices.  This  also  is  a  matrix  (of  the  same  size),  with  error 
elements  Cy.  A  scalar  measure  of  the  statistical  discordance  can  be  calculated  as  the  typical  (average) 
value  of  the  error  elements,  defined  as 


-  1  , 


''ji 


where  |  ej  j  \  =  the  absolute  value  of  the  error  matrix  elements 

n  =  the  matrix  dimension  (i.e.,  the  number  of  elements  in  a  row  or  colimm). 
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Errors  usually  get  smaller  as  a  linear  function  of  the  length  of  the  fixation  string,  when  plotted  on  a 
double  logarithmic  scale.  Errors  also  are  smaller  when  the  state  transitions  are  quasideterministic 
(probabilities  of  state  transitions  not  all  equal)  as  opposed  to  random  (state  transition  probabilities  about 
equal  throughout  the  matrix). 

This  modeling  technique  requires  that  areas  of  interest  in  a  target’s  image  be  determined  by 
measuring  eye  fixations,  and  that  a  specific  fixation  point  be  assigned  to  each  area  of  interest.  Currently 
this  is  done  by  inspection  and  somewhat  arbitrary  grouping  of  points.  Clustering  algorithms  (used  in 
cosmology)  might  be  useful  in  deciding  to  which  group  of  points  a  given  fixation  belongs. 

4.3.2  String  Editing  Models  of  Scanpaths 

Another  technique  suggested  as  a  measure  of  scanpath  similarity  is  that  referred  to  as  string 
editing.  A  given  string  of  fixation  sequences  (e.g.,  BBFFAAAAJBCEDCG)  can  be  compared  with 
another,  possibly  of  a  different  length  (e.g.,  BFFAAAABFABBCEDCEDCGGGG)  to  determine  how 
many  substitution,  deletion,  and  insertion  operations  are  necessary  to  convert  the  first  into  the  second. 
The  distance  between  the  two  strings  can  be  considered  a  fimction  of  costs  that  can  be  assigned  to  each 
type  of  operation,  e.g.,  either  costs  the  same  for  all  kinds  of  operations,  or  varied  costs  such  as  1  for 
substitution,  2  for  insertion,  and  3  for  deletion. 

For  example,  required  substitutions  can  be  assigned  a  cost  of  2  and  both  deletions  and  insertions  a 
cost  of  1.  To  transform  the  string  ACA  to  CAD  AC  requires  inserting  a  C  at  the  begiiming  (cost:  1)  and 
at  the  end  (1),  and  substituting  a  D  for  the  C  in  the  middle  (2).  The  resulting  total  cost  (and  thus  the 
distance)  is  4.  As  strings  get  longer,  the  ways  of  transforming  one  into  another  increase  very  fast,  so  it 
is  not  trivial  to  find  the  transformation  tliat  costs  least.  An  algoridun  based  on  modified  dynamic 
programming  thus  has  been  developed  that  guarantees  that  the  minimum  distance  between  two  strings 
will  be  found.  The  resulting  method  shows  promise  for  automating,  objectifying,  and  quantifying  the 
similarity  of  scanpath  fixations. 

4.3.3  Modeling  the  Scanning  Process 

A  schematic  representation  of  a  possible  Stark  scanpath  model  for  an  aheady-detected  target  is 
shown  in  Figure  10.  The  model  consists  of  three  phases. 

•  Prepare  a  model  of  an  actual  target's  scanpath  (starting  when  detection  is  complete). 

•  Predict  or  collect  the  target  image's  main  features  of  interest. 

•  Predict  or  collect  the  fixation  sequence  forming  the  target  image's  scanpath. 

•  Develop  a  state  transition  matrix  for  the  probability  of  transitions  along  the  scanpath  for  this 
target. 


Hacisalihzade,  S.S.,  Stark,  L.W.,  and  Allen,  J.S.  Visual  Perception  and  Sequences  of  Eye  Movement 
Fixations;  a  Stochastic  Modeling  Approach,  'm.lEEE  Transactions  on  Systems,  Man,  and  Cybernetics, 
vol  22,  no.  3,  p.  496,  May/June  1992. 
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•  Detect  a  probable  target  in  the  scene  and  prepare  a  model  of  the  probable  target's  scanpatli. 

•  Predict  or  collect  the  image's  main  features  of  interest. 

•  Predict  or  collect  the  fixation  sequence  forming  the  image's  scanpath. 

•  Develop  a  state  transition  matrix  for  the  probability  of  transitions  along  the  scanpath  for  this 
probable  target. 

•  Compare  the  actual  and  probable  target  scanpaths  to  determine  similarity. 

•  Calculate  the  error  between  the  two  state  transition  matrices. 

•  If  tlie  error  is  large,  reject  the  probable  target;  if  the  error  is  small,  accept  the  probable  target 
as  an  actual  target. 
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Figure  10.  Schematic  Representation  of  Stark's  Visual  Scanpath  Model. 
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5.0  Theoretical  and  Computer-Based  Models 


5.1  Theoretical  Model  of  Visual  Search 

Stark  and  his  coworkers  have  laid  the  theoretical  groundwork  for  a  comprehensive  computerized 
Model  of  Visual  Search,  based  on  the  research  discussed  above.  However,  this  comprehensive  model 
has  not  yet  been  completed  or  implemented.  The  theoretical  model  is  a  serial  one,  with  the  various 
processes  carried  out  in  roughly  the  same  sequence  as  in  the  eye  and  brain.  When  the  program  is 
operating,  model  output  is  expected  to  be  number  of  detections,  number  of false  alarms,  and  time  to 
detect.  Theoretical  model  components  are  shown  in  Figure  11,  and  include 

•  Physics  of  the  search  scene,  targets,  and  environment.  This  component  models  the  search  area 
and  the  objects  located  there.  The  background  can  be  homogeneous  or  heterogeneous.  Target 
contrast,  size,  and  distance  will  be  modeled,  along  with  atmospheric  attenuation  and  sensor  effects 
on  the  target  image.  False  targets  in  the  search  scene  will  be  included. 

•  Physics  of  eye  movement  and  the  retinal  image.  The  effects  of  what  Stark  considers  the  three 
most  important  factors  for  detection  —  apparent  contrast,  apparent  size,  and  eccentricity'  with 
respect  to  the  visual  axis  —  are  modeled.  Models  are  included  both  for  total  visual  lobe  detection 
processes  and  for  foveal  recognition  processes. 

•  Human  search  behavior.  This  includes  models  for  various  search  strategies,  both  systematic  and 
random.  Both  search  patterns  and  search  paths  will  be  modeled. 

•  Higher  level  psychological  processes.  Cognitive  processes  that  control  active  vision  are 
modeled.  Factors  such  as  utility,  cost/benefit,  observer  experience  and  training,  foveal  load, 
peripheral  clutter,  vigilance,  and  fatigue  will  be  included,  as  these  affect  an  observer's  decision  to 
consider  a  detected  object  to  be  a  target. 

•  Probabilistic  physiology  of  the  eye  and  brain.  This  component  uses  decision  models  to  provide 
the  probability  of  detection  and  time  to  detect,  based  on  the  results  of  experimental  studies. 

5.2  Prototype  Computer  Version  of  Model 

While  tire  comprehensive  of  Visual  Search  is  far  from  completion,  a  computerized 

prototype  was  developed  between  July  1992  and  February  1993. The  prototype  first  was 
implemented  using  the  Microsoft  Excel  spreadsheet  program  on  a  NeXT  computer  system,  and  later 
was  reprogrammed  in  tlie  C  language  for  a  Silicon  Graphics,  Inc.,  workstation.  The  remainder  of 
Section  5  describes  this  prototype  system. 

The  computerized  version  of  the  model  does  not  at  present  mclude  the  proposed  cogmtive  models. 
The  scanning  process,  considered  integral  to  the  recognition  phase  of  visual  search,  is  missing.  For  now 
recognition  simply  is  a  function  of  target  size  and  contrast,  along  with  a  random  draw  to  simulate  other 
indeterminate  factors.  Also  missing  are  the  task  and  strategy  models  that  significantly  affect  the 
searching  and  scanning  processes. 


Stark,  L.  W.,  Christiansen,  J.,  and  Dixon,  D.  Final  Report:  Model  of  Visual  Search,  UC  Berkeley, 
Berkeley,  CA,  February  1993. 
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Figure  1 1 .  Components  of  a  Comprehensive  Theoretical  Model  of  Visual  Search. 

Since  this  is  a  preliminary  computerized  model,  simplifications  are  included  so  that  the  concepts 
can  be  demonstrated  efficiently.  That  is,  it  is  an  ’’empty  FOV  model,"  numerous  parameters  are  set  to 
constant  or  fixed  values,  and  many  other  parameters  are  predefined  and  entered  by  the  user  prior  to  each 
run.  The  user  does  not  interact  with  the  system  during  a  program  run. 
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The  computerized  system  consists  of  seven  modules  or  sets  of  algorithms.  Figure  12  shows  the 
relationships  among  the  program’s  modules,  which  are  listed  below,  and  Figure  13  provides  a  flowchart 
of  the  process. 


Figure  12.  Relationships  Among  the  Modules  of  the  Prototype  Computerized  Mode!  of  Visual  Search. 
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F  =  size  of  fovea  (0.5  deg) 


B 

D) 

P 


P 

O 


at 

IS 


T  =  number  of  targets  in  search  scene 
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ure  13.  Flowchart  of  the  Processes  Included  In  the  Prototype  Moaei  oi  V/Suar  oc7a/o/f. 
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•  Main  program  Protocol  algorithms.  This  is  the  driver  or  control  module.  It  initializes  the 
system  with  user-entered  run  parameters,  calls  modules  as  required,  collects  data  as  other  modules 
generate  it,  provides  data  to  other  modules  as  needed,  and  calculates  and  outputs  the  program's 
results.  Output  essentially  is  the  number  of  targets  and  decoys  detected,  the  number  of  targets 
recogmzed,  and  the  number  of  false  alarms  (decoys  recognized  as  targets).  Search  time,  search 
area,  detections-per-glimpses  ratio,  and  recognitions-per-glimpses  ratio  also  may  be  output. 

•  Search  Scene  Construction  algorithms.  This  module  sets  up  a  scene  that  will  be  searched  for 
targets.  The  target  area  scene  is  defined  as  a  checkerboard  matrix,  with  each  cell  representing  the 
visual  lobe  area  or  area  covered  in  one  glimpse  of  the  search  scene.  Targets  or  decoys  are  included 
in  some  of  the  cells.  Targets  and  decoys  are  assigned  sizes  and  contrast  values  as  well  as  locations 
in  the  simulated  field  of  view. 

•  Systematic  Row  Search  algorithms.  This  module  moves  the  simulated  visual  lobe  systematically 
over  the  scene  constructed  by  the  Search  Scene  Construction  module.  A  boustrophedon 
("windshield  wiper")  row  search  is  performed  from  upper  left  to  lower  right,  with  each  row 
dropping  into  the  one  below  without  overlap.  The  search  proceeds  in  discrete  jumps  of  constant 
size  that  simulate  glimpses. 

•  Random  Search  algorithms.  This  module  moves  the  simulated  visual  lobe  randomly  over  the 
scene  constructed  by  the  Search  Scene  Construction  module.  The  search  proceeds  in  discrete  jumps 
of  constant  size  that  simulate  glimpses;  only  the  direction  of  movement  is  randomized,  using  a 
random  number  generator. 

•  Hard  Visual  Lobe  algorithms.  This  module  provides  the  probability  of  target/decoy  detection  for 
a  given  glimpse.  A  hard  lobe  is  modeled  for  simplicity;  thus  the  detection  probability  is  constant 
for  all  eccentricities  of  die  target/decoy  image  on  the  retina  within  three  specified  lobe  limits  (1,3, 
and  5  degrees),  and  zero  outside  these  limits.  Probability  of  detection  is  simply  a  function  of 
target/decoy  apparent  size  and  contrast. 

•  Hard  Foveal  Lobe  algorithms.  This  module  provides  die  probability  of  target/decoy  recognition 
using  foveal  vision,  once  the  target  has  been  detected  somewhere  in  the  glimpse  area.  The  fovea  is 
assumed  to  be  0.5  degree  and  target/decoy  apparent  size  is  assumed  to  be  <  0.5  degree. 

Probability  of  recognition  is  simply  a  function  of  target/decoy  contrast. 

•  Decision  algorithms.  This  module  determines  whether  a  target/decoy  has  been  detected  and 
whether  a  target  has  been  recognized,  that  is,  whether  the  calculated  probability  of  acquisition  will 
be  considered  adequate  for  acquisition  actually  to  have  occurred  in  the  overall  model.  A  random 
number  (0  to  1)  is  compared  with  the  probability  of  detection/recognition  provided  by  one  of  the 
two  lobe  models;  if  the  random  number  is  smaller,  the  object  is  assumed  to  have  been 
detected/recognized.  This  techmque  typically  is  used  in  combat  simulations  to  represent  random 
battlefield  effects  that  are  not  being  individually  modeled. 

5.3  User-Entered  and  Predefined  Data 

For  this  prototype  system,  the  user  specifies  the  following  parameters  for  each  run.  In  general, 

default  values  are  provided  which  the  user  may  change  if  desired.  An  example  spreadsheet  form  for 

partial  data  entry  and  data  results  is  shown  in  Figure  14. 


24 


Searching  and  Scanning:  a  Review  of  Lawrence  W.  Stark’s  Vision  Models 


lobe 

scene 

slat 

doled 

focog 

workshoGi  names 

lobe.Viewl 

scene.  View  1 

dloinl 

stat.  View  1 

large! 

1  ^.G^9] 

15 

5 

A 

decoy  1 

f  3.335] 

74 

10 

decoyZ 

[2.241] 

66 

20 

decoy3 

(0.8^31 

62 

9 

bk-noise 

rand*0.4 

false*afarm 

5 

Pd- fovea 

0.28 

Pd-poripheryl 

0.27 

Ihroshoid-delocl 

l>0.51 

Pr-fovoa 

0.9 

Figure  i^.  Example  Data  Entry  Worksheet.  The  column  labeled /o/je  includes  the  probability  of  detection 
and  of  recognition  values  for  the  1-degree,  3-degree,  and  5-degree  eccentricity  levels  of  the  visual  lobe. 

The  scene  column  provides  contrast  values  for  the  targets  and  the  three  kinds  of  decoys,  plus  the  formula 
to  be  used  for  generating  the  level  of  background  noise.  The  stat  column  lists  the  total  numbers  for  targets 
and  for  each  kind  of  decoy.  In  the  detect  column,  the  value  [>0.5]  represents  the  size  threshold  to  be  used 
for  detection,  while  the  numbers  at  the  top  are  the  results  returned  by  the  program  for  number  of 
detections.  The  recog  column  shows  the  number  of  targets  correctly  recognized  and  the  number  of  false 
alarms,  at  the  end  of  a  run.  [See  Footnote  1 5] 

1 .  Scene  characteristics: 

a.  Search  scene  size:  Scenes  are  predefined  on  separate  worksheets  as  scene.  Viewl, 
scene.  View2,  etc.,  characterized  by  Af  and  N,  the  number  of  rows  and  columns  of  large  cells 
(each  representing  one  5 -degree  visual  lobe  or  glimpse  area)  in  the  total  search  scene.  MV  is 
the  number  of  glimpses  required  to  cover  the  scene  systematically.  Figure  15  shows  a  typical 
search  scene. 

b.  Lobe  size.  Visual  lobes  are  predefined  on  separate  worksheets  as  lobe.  View],  lobe.  View2, 
etc.,  characterized  by  m  and  n,  tlie  number  of  rows  and  columns  of  1 -degree  small  cells 
(representing  eccentricity  levels)  included  in  a  single  5 -degree  large  cell. 

c.  Scene  clutter  level.  Clutter  is  defined  by  die  value  diat  will  be  used  for  backgroimd  noise 
(bk-noise)  in  the  scene,  e.g.,  (0.4  *  a  random  number). 

d.  Number  of  targets.  The  user  enters  J,  die  total  quantity  of  targets  that  will  be  located  in  all 
of  die  cells,  e.g.,  15. 

e.  Target  contrast.  The  user  specifies  Cj^,  a  value  representing  target-to-background  contrast, 
in  die  target-scene  cell,  e.g.,  [1.649]. 
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igure  15.  Example  of  a  Search  Scene.  Each  large  square  (upper  left)  represents  5  degrees  of  visual 
angle,  the  visual  lobe  size  set  for  one  glimpse  of  the  scene.  A  small  square  (center  of  the  large  square) 
represents  1  degree  of  the  visual  lobe.  A  systematic  row  search  pattern  is  illustrated  here  The  letter  T 

^  ^  ®  ^ecoy.  The  center  1 -degree  square  within  a  5-degree  total 

?  'T  ®  of  foveal  vision  (0.5  degree)  plus  0.5  degree  of  peripheral  vision;  the 

nprinhenf  squares  from  this  central  square  represents  the  eccentricity  (distance  into  the 

periphery)  of  a  target/decoy.  [See  Footnote  1 5]  ^ 


f.  Number  of  Decoys  (Type  1).  The  user  specifies  Z)/,  the  total  quantity  of  this  kind  of  decoy 
that  will  be  located  in  all  of  the  cells,  in  tlie  decoy  1-stat  cell,  e.g.,  74. 

g.  Decoy  (Type  1)  contrast.  Tlie  user  specifies  Cjj a  value  representing  decoy-to- 
background  contrast,  in  tlie  decoyl-scene  cell,  e.g.,  [3.335]. 

h.  Number  of  Decoys  (Type  2).  As  with  decoy  1,  D2,  tlie  total  quantity  of  this  kind  of  decoy 
tliat  will  be  located  m  all  of  tlie  cells,  e.g.,  66. 

i.  Decoy  (Type  2)  contrast.  As  with  decoy  I ,  CQ2,  a  value  representing  decoy-to-background 
contrast,  e.g.,  [2.241]. 
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j.  Number  of  Decoys  (Type  3).  As  with  decoyl,  D^,  the  total  quantity  of  this  kind  of  decoy 
that  will  be  located  in  all  of  the  cells,  e.g.,  62. 

k.  Decoy  (Type  3)  contrast.  As  with  decoyl,  a  value  representing  decoy-to-background 
contrast,  e.g.,  [0.843]. 

2.  Search  model  to  be  used.  Either  S^,  Systematic  Row  Search,  or  Random  Search,  may  be 
specified. 

3.  Number  of  glimpses.  G,  the  total  number  of  glimpses  that  will  be  used  for  the  search,  is  specified 
(used  only  for  the  Random  Search  module). 

4.  Detection  and  recognition  probabilities.  Values  from  0  to  1  are  used  for  P^j  for  the  1 -degree, 
P^2  3-degree,  and  P^^  5-degree  lobe  limits  (e.g.,  0.28,  0.27,  and  0.24),  and  for  for  the 
probability  of  foveal  recognition  (e.g.,  0.9). 

Several  parameters  are  fixed  for  all  runs.  These  include: 

1.  Target  size.  This  is  referred  to  as  oj,  fixed  at  0.5  degree,  or  2  degrees  per  cycle. 

2.  Decoy  size.  These  are  referred  to  as  and  fixed  at  0.5  degree,  or  2  degrees  per 

cycle. 

3.  Foveal  lobe  size.  Lobe  size  Fis  fixed  at  0.5  degree. 

5.4  Main  Protocol  Module 

The  Protocol  module  provides  the  driver  or  main  control  algorithms.  All  other  modules 
communicate  via  the  Protocol  module. 

Data  Inputs. 

All  of  the  user  inputs  listed  above  are  input  to  the  Protocol  module,  for  distribution  to  other 
modules  as  needed. 

Data  Processing. 

1 .  Compare  the  user-entered  scene  size  and  lobe  size  to  determine  that  they  are  compatible. 

2.  Invoke  the  Search  Scene  Construction  module,  pass  required  parameters,  and  collect  outputs. 

3 .  Select  and  invoke  either  the  Systematic  Row  Search  module  or  the  Random  Search  module  for  the 
search  process  (based  on  the  user's  entry),  pass  required  parameters,  and  collect  outputs. 

4.  Invoke  the  Hard  Visual  Lobe  module,  pass  required  parameters,  and  collect  outputs. 

5.  Invoke  the  Decision  module,  pass  required  parameters,  and  collect  its  outputs  concerning  whether 
an  individual  target  or  decoy  was  detected. 

6.  Tabulate  the  Decision  module  outputs  to  determine  the  total  number  of  objects  detected. 


27 


Searching  and  Scanning:  a  Review  of  Lawrence  W.  Stark's  Vision  Models 


7.  Invoke  the  Hard  Foveal  Lobe  module,  pass  required  parameters,  and  collect  outputs  concerning 
whether  an  individual  object  was  recognized. 

8.  Invoke  the  Decision  module,  pass  required  parameters,  and  collect  its  outputs  concerning  whether 
individual  targets  were  recogmzed  and  whether  decoys  were  recognized  as  targets. 

9.  Tabulate  the  Decision  module  outputs  to  determine  the  total  number  of  targets  recognized  and  the 
number  of  false  alarms  (decoys  falsely  recognized  as  targets). 

Module  Outputs. 

1 .  Total  number  of  detections  of  targets,  7j,  and  decoys,  Djj,  D  2^,  D  5^. 

2.  Total  niunber  of  targets  recognized,  7).. 

3.  Total  number  of  false  alarms  (decoys  recognized  as  targets),  Tj-^. 

4.  Total  search  time  (from  Systematic  Row  Search  module  only):  t^. 

5.  Total  search  area  covered  (from  Random  Search  module  only):  {{x,  j);„,„,  (x,  y)max)- 

6.  Cumulative  detections-per-glimpses  ratio:  {Df+Djjj  +  D£)2  +  DjyWG. 


5.5  Search  Scene  Construction  Module 

This  module  sets  up  the  very  simple  scene  that  will  be  searched  for  targets  (see  Figure  15).  The 
scene  represents  an  apparently  random  distribution  of  targets  and  decoys  (similar  to  scenes  illustrated  in 
Figmes  2  and  3,  not  that  of  Figure  4).  Search  scene  parameters  include  the  number  of  large  and  small 
cells  in  the  matrix,  cell  contents  as  targets  or  decoys,  and  the  contrast  and  size  of  targets  and  decoys. 
About  1  to  2  per  cent  of  cells  typically  may  contain  targets  and  3  to  6  per  cent  contain  decoys. 


As  shown  in  Figure  15,  tlie  search  scene  is  divided  into  a  rectangular  matrix  of  larger  cells  each 
representing  the  visual  lobe  area  covered  by  a  glimpse  (taken  here  to  be  5  degrees  wide  and  5  degrees 
high).  Each  visual  lobe  cell  is  further  divided  into  smaller  cells  each  representing  a  small  portion  of  the 
visual  lobe  area  (25  cells,  each  1  degree  wide  by  1  degree  high,  in  the  example),  used  to  simulate  image 
eccentricity  in  the  visual  lobe.  That  is,  one  probability  of  detection  can  be  assigned  if  the  target/decoy  is 
located  in  the  center  small  cell,  a  lower  probability  can  be  assigned  if  it  is  in  the  "ring"  just  outside  the 
center  small  cell,  and  a  still  lower  value  can  be  assigned  for  the  outermost  "ring"  of  small  cells  within  a 
large  visual  lobe  cell  (see  Figure  16).  If  there  are  36  large  visual  lobe  cells  in  the  scene  and  25  small 
cells  per  large  cell,  the  result  is  a  matrix  of  900  cells. 


Data  Inputs. 

1 .  The  niunber  of  rows,  M,  and  the  number  of  columns,  N,  of  large  visual  lobe  cells  in  the  total  search 
scene. 


2.  The  number  of  rows,  m,  and  the  number  of  coliunns  n,  of  small  cells  in  each  larger  visual-lobe  cell. 

3.  Number  of  targets  in  the  total  search  scene,  T. 

4.  Number  of  decoys  of  each  type  in  the  scene,  Z)  2 ,  Z)2,  £>3 . 
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Probability  of  Detection 
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-5-3-10  1  3  5 

Degrees  Eccentricity  from  Visual  Axis 
(Simulated) 


Figure  16.  Representation  of  a  Three-Level  Hard  Visual  Lobe.  The  closer  the  image  is  to  the  visual  axis, 
the  higher  the  probability  of  detection.  Eccentricity  levels  shown  here  (1  degree,  3  degrees,  and  5  degrees) 
are  the  same  as  used  in  Figure  15,  within  a  given  simulated  5-degree  visual  lobe.  [See  Footnote  15] 

Data  Processing. 

1.  The  location  of  the  /th  target/decoy  has  coordinates  (xy,  y^)  within  a  foveal  lobe  cell  located  at  (X/, 
7y).  Possible  target/decoy  locations  range  IGrom  x/  =  (1, ...,  Nn)  andyy  =  (1, 

2.  Determine  the  location  of  the  /th  target/decoy  by  drawing  a  random  number; 
xy  =  RAND(l,...,M2), 

3.  To  ensure  tliat  no  more  tlian  one  target/decoy  occupies  a  single  cell,  include  tlie  constraint 
ixi,yi)  ^  (xj,yj)  when  /  ^J; 

for  i  =  1, (7’+  D1+D2  +  £>3);  y  =  1, (T+D1+D2+  £>3). 

4.  Continue  tlie  process  for  a  total  of  (T+  D j  +  D2  +  D^)  times. 

Module  Outputs. 

1 .  For  each  target/decoy: 

a.  Object  type.  Target,  Decoy  1,  Decoy2,  or  Decoy  3  (probably  identified  by  its  assigned 
contrast  value). 

b.  Object  location,  (Xj,  y,). 

5.6  Search  Modules 

5.6.1  Systematic  Row  Search  algorithms 

This  module  moves  the  simulated  visual  lobe  systematically  over  the  scene  constructed  by  the 
Search  Scene  Construction  module  (see  Figure  15).  A  boustrophedon  ("windshield  wiper")  row  search 
is  performed  jfrom  upper  left  to  lower  right,  with  each  row  dropping  to  the  one  directly  below  (rather 
than  returning  to  the  left-most  column)  and  widi  no  overlap.  Tlie  search  proceeds  in  discrete  jumps  of 
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constant  size  that  simulate  glimpses.  Thus  the  search  scene  is  viewed  section  by  section,  where  each 
section  is  equal  in  area  to  that  of  the  simulated  visual  lobe  (one  glimpse). 

Data  Inputs. 

1 .  The  number  of  rows,  M,  and  the  number  of  columns,  N,  of  large  visual  lobe  cells  in  the  total  search 
scene. 

Data  Processing. 

1.  Label  the  search  scene  rows  and  the  columns  Y Each.?)  and  Yj  intersect  a  section 

equal  to  the  visual  lobe  size. 

2 .  Start  atXjY],  This  is  the  first  glimpse. 

3 .  Hold  X I  constant  and  increment  Y j  to  Y 2- 

4.  Continue  this  process  until  Yj  =  Y^  then  proceed  to  ^2!^. 

5.  Hold  X2  constant  and  decrement  Y^j  until  7/  =  7; .  Continue  this  process  until 
Xi-XM,  Yi  =  YM. 

6.  Calculate  search  time,  as  [(1/3  second)  *  {MN glimpses)]. 

Module  Outputs. 

1 .  A  visual  lobe  cell  to  be  searched  during  a  given  glimpse  (for  use  bv  the  Visual  or  Foveal  Lobe 
module):  (A)7y). 

2.  Search  time, 

5.6.2  Random  Search  algorithms 

This  module  moves  the  simulated  visual  lobe  randomly  over  the  scene  constructed  by  the  Search 
Scene  Construction  module  (Figure  15).  The  search  proceeds  in  discrete  jumps  that  simulate  glimpses; 
both  the  direction  and  distance  of  movement  are  randomized,  using  a  random  munber  generator. 
Movement  is  possible  in  eight  directions,  except  for  the  outer-most  rows  and  columns.  The  search  scene 
is  viewed  section  by  section,  where  each  section  is  equal  in  area  to  that  of  the  simulated  visual  lobe  (one 
5-degree  glimpse,  in  the  example). 

Data  Inputs. 

1 .  The  number  of  rows,  M,  and  the  number  of  columns,  N,  of  large  visual  lobe  cells  in  the  total  search 
scene. 

2.  The  number  of  rows,  m,  and  the  number  of  columns  «,  of  small  cells  in  each  larger  visual-lobe  cell. 

3 .  Maximum  number  of  glimpses  to  be  used  for  the  search  process,  G. 

Data  Processing. 

1 .  Each  visual  lobe  cell  has  a  location  {Xj,  Yj)  in  the  search  scene  (which  consists  of  M  rows  and  N 

columns).  The  visual  lobe  cell  is  composed  of  m  rows  and  n  columns  of  small  cells,  each  with 
location  (x,,  y^)  within  in  the  lobe  cell. 
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2.  Referring  to  Figure  15,  the  center-most  small  cell  in  a  visual  lobe  cell  represents  the  foveal  area, 
and  has  location  (x^,  y^).  When  the  observer's  glimpse  moves  from  the  center  of  the  current  visual 
lobe  cell,  it  must  cross  l/2(m  -  1)  small  cells  in  the  y  -axis  and  l/2(«  -  1)  small  cells  in  the  x  -axis 
of  the  current  visual  lobe  cell,  then  another  comparable  amount  in  the  target  visual  lobe.  Thus  the 
offset  of  the  center  location  (x^,  y^)  from  one  glimpse  to  the  next  is: 

^offset  ^  2(1/2(a2-1). 
y offset  ~  2(1/2(772-1). 

3.  The  munber  of  possible  new  centers  for  the  lobe  must  be  reduced  by  the  offset  values: 

^possible  ~  ^offset' 

y possible  ^y offset' 

4.  If  the  current  glimpse  is  /,  and  the  current  location  of  the  center  of  the  visual  lobe  is  (xy,  y^),  then 
the  next  location  of  the  center  of  the  visual  lobe  is  (x/  +  y,  yy  +  y),  where 

(^/  +  y)  “  RAND  (1,  ^possible^  offset 
(y/  +  y)  ""  RAND  (l?yy7o^5/^/e)  ^^^y offset¬ 
s'  Continue  random  search  process  until  i  =  maximum  number  of  glimpses  specified. 

6.  Compute  the  range  of  the  search  over  the  total  search  area,  ((x,  (x,  y)^ax)^ 

^min  +  y  “  ^^'^offseU  ymin  “T/  +  y  "  ^  offset- 

^max  ”  +  y  ^/^offseU  ymax^yi  +  y  ^^'^offset- 

Module  Outputs. 

1.  Visual  lobe  cell  to  be  searched  during  a  given  glimpse  (for  use  by  the  Visual  or  Foveal  Lobe 
module):  (XyTy). 

2.  Search  area  covered  by  the  Random  Search  module:  ((x,  y)^/„,  (x,  y)YYia:^- 

5.7  Lobe  Detection  Modules 

5.7.1  Hard  Visual  Lobe  algorithms 

This  module  provides  the  probability  of  target/decoy  detection  using  the  total  visual  lobe,  including 
both  foveal  and  peripheral  vision.  The  visual  lobe  is  treated  as  a  three-level  "cookie  cutter"  detection 
apparatus,  with  annular  "rings"  (actually  squares)  subtending  1  degree,  3  degrees,  and  5  degrees 
centered  on  the  visual  axis,  within  the  modeled  5-degree  visual  field  for  a  single  glimpse  (see  Figure 
16).  Since  a  three-level  hard  lobe  is  modeled,  detection  probabilities  are  constant  for  all  eccentricities 
of  the  target/decoy  image  on  the  retina  within  the  specified  lobe  limits,  and  zero  outside  these  limits. 

Within  a  given  lobe  "ring,"  probability  of  detection  is  simply  a  fimction  of  target/decoy  size,  o), 
and  contrast,  C  For  these  simulations,  the  target  was  fixed  at  0.5  degree  in  size;  in  terms  of  spatial 
frequency,  m  =  2  cycles/degree.  Thus  the  probability  of  detection  is  a  fimction  only  of  the  assigned 
target/decoy  contrast  values. 

Data  Inputs. 

1.  Visual  lobe  cell  to  be  searched  during  this  glimpse  (from  the  Systematic  or  Random  Search 
module), 
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2.  For  any  target/decoy  in  that  cell: 

a.  Location  in  the  large  cell,  (x^, 

b.  Contrast  value  for  that  target,  Cj,  or  for  that  type  of  decoy,  Cjjj,  C£)2,  or 

c.  Size,  cof^  ^Dh  fixed  at  0.5  degree  (2  degrees  per  cycle). 

3.  Probability  of  detection  as  a  function  of  eccentricity  within  the  visual  lobe  cell, 

Data  Processing. 

1 .  Determine  whether  a  target/decoy  is  present  in  this  cell  If  so,  go  to  Step  2.  If  not,  get  another 
visual  lobe  cell  from  the  Systematic  or  Random  Search  module. 

2.  Determine  from  its  (xy,  y^)  position  in  which  of  the  three  lobe  levels  (1  degree,  3  degrees,  or  5 
degrees  of  eccentricity)  the  target/decoy  is  located,  and  assign  it  an  appropriate  table  of  values  of 
detection  probabilities  (as  a  function  of  contrast)  for  that  lobe  level. 

3.  For  a  given  lobe  level,  use  the  general  equation  for  hard  lobe  detection,  P^  =J{C,a)).  Since  cu  is 
fixed  at  2  cycles/degree,  Pj  =J{  C^). 

4.  Use  the  appropriate  table  to  select  the  applicable  value  for  P^,  for  a  given  eccentricity  level  and 
contrast  level,  e.g., 


Simulation 

Experimental 

0.08 

12 

0.1 

0.05 

0.20 

5 

0.5 

0.46 

0.33 

3 

0.9 

0.97 

5.  Get  the  next  visual  lobe  cell  from  the  Systematic  or  Random  Search  module,  and  repeat.  Continue 
this  process  until  all  search  scene  cells  have  been  reviewed. 

6.  Calculate  the  detections-per-glimpses  ratio  for  all  objects  (see  Figure  17): 

{Df  +  Ddi  +  Z)^2  +  Dqs/G. 

Module  Outputs. 

1 .  Probability  of  detection  Pj  for  each  target/decoy  in  each  visual  lobe. 

2.  Cumulative  detections-per-glimpses  ratio  for  all  objects: 

(Pt  ^  ^D1  +  ^D2  +  D£)3/G- 
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Cumulative 

Targets/Decoys 

Detected 


Figure  17.  Example  of  Cumulative  Detections  as  a  Function  of  Number  of  Glimpses.  This  example 
assumes  that  the  search  scene  contains  50  visual  lobe  cells  that  include  a  total  of  5  targets.  Multiple 
passes  (50  glimpses  each)  are  made  over  the  search  scene.  Number  of  targets/decoys  detected  is 
provided  for  four  search  conditions:  systematic,  \A^ith  probability  of  detection  of  1.0  and  0.3,  and  random, 
with  the  same  probabilities  of  detection.  [See  Footnote  15] 


5.7.2  Hard  Foveal  Lobe  algorithms 

This  module  provides  the  probability  of  target/decoy  recognition  using  foveal  vision,  once  the 
target  has  been  detected  somewhere  in  tlie  visual  lobe.  The  target/decoy  closest  to  the  center  of  the 
visual  axis  during  a  given  glimpse  is  selected  for  fiirther  inspection  and  possible  recognition. 

The  fovea  is  modeled  as  subtending  0.5  degree.  Since  acuity  is  uniform  over  the  foveal  area, 
recognition  probabilities  are  constant  for  all  eccentricities  of  tlie  target/decoy  image  on  the  fovea. 
Target/decoy  size  is  assumed  to  be  <  0.5  degree,  so  probability  of  recognition  is  simply  a  function  of 
target/decoy  contrast. 

Data  Inputs. 

1 .  Visual  lobe  cell  to  be  searched  during  tliis  glimpse  (from  tlie  Systematic  or  Random  Search 
module),  (XjYj). 

2.  For  any  target/decoy  in  that  cell: 

a.  Type,  Target,  Decoy  Decoy2,  or  Decoy  3. 

b.  Location  in  die  visual  lobe, 
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c.  Contrast  value  for  that  target,  Cj,  or  type  of  decoy,  C^j,  0^)2’  ^DS- 

d.  Size,  (j}j',o)£)j,o}£)2,or  fixed  at  0.5  degree  (2  degrees  per  cycle), 

3.  Foveal  lobe  size  F,  fixed  at  0.5  degree. 

4.  Probability  of  recognition  within  the  visual  lobe  cell,  P^. 

Data  Processing. 

1 .  Determine  whether  a  target/decoy  is  present  in  this  cell.  If  so,  go  to  Step  2.  If  not,  get  another 
visual  lobe  cell  from  the  Systematic  or  Random  Search  module. 

2.  Using  the  general  equation  for  hard  lobe  recognition,  =j{C,(o).  Since  m  is  fixed  at  2 
cycles/degree,  Pj.  =ACm)- 

3 .  Use  the  appropriate  table  to  select  the  applicable  value  for  as  a  function  of  contrast,  e.g. , 


c 

Simulation 

Experimental 

0.08 

12 

(not  provided 

(not  provided 

0.20 

5 

in  documentation) 

in  documentation) 

0.33 

3 

4.  Get  the  next  visual  lobe  cell  from  the  Systematic  or  Random  Search  module,  and  repeat.  Continue 
this  process  imtil  all  Search  module  cells  have  been  reviewed. 

Module  Outputs. 

1 .  Probability  of  recognition  P^  for  the  target/decoy  closest  to  the  visual  axis  of  the  visual  lobe  cell. 

5.8  Decision  Module 

This  module  determines  whether  a  target/decoy  has  been  detected  and  also  whether  a  target  has 

been  recognized.  A  random  number  (0  to  1)  is  compared  witli  tlie  probability  of  detection/recognition 

provided  by  the  lobe  construction  algorithms.  If  the  random  number  is  smaller,  the  object  is  assumed  to 

have  been  detected/recognized. 

Data  Inputs. 

1.  Probability  of  detection,  Pj,  for  each  target/decoy  detected  in  the  Visual  Lobe  module. 

2.  Probability  of  recognition,  P for  each  target/decoy  recognized  in  the  Foveal  Lobe  module. 

3.  Type  of  object  tliat  w'as  recognized  as  a  target:  Target,  Decoy  1,  Decoy2,  or  Decoy  2  (probably 
identified  by  its  assigned  contrast  value). 

Data  Processing:  Detection. 

1 .  Determine  whether  a  target/decoy  has  been  detected  in  this  cell.  If  so,  determine  Pj  for  that 

target/decoy  and  go  to  Step  2.  If  not,  get  another  visual  lobe  cell  from  the  Hard  Visual  Lobe 
module. 


34 


Searching  and  Scanning:  a  Review  of  Lawrence  W.  Stark’s  Vision  Models 


2.  Obtain  a  random  number,  r,  from  a  random  number  generator  such  that  0  <  r  <  1 . 

3,  Compare  r  with  If  r  <  then  the  object  is  detected. 

Data  Processing:  Recognition. 

1 .  Determine  whether  a  target/decoy  has  been  recognized  in  this  cell  If  so,  determine  P^  for  that 
target/decoy  and  go  to  Step  2.  If  not,  get  another  visual  lobe  cell  from  the  Hard  Foveal  Lobe 
module. 

2.  Obtain  a  random  number,  r,  from  a  random  number  generator  such  that  0  <  r  <  1 . 

3.  Compare  r  with  Py..  If  r  <  then  the  object  is  recognized  as  a  target. 

4.  Determine  whether  that  object  actually  is  a  target  or  is  a  decoy.  If  it  is  a  target,  the  result  is 
consider^  to  be  a  recognition;  if  not,  it  is  considered  to  be  a  false  alarm. 

Module  Outputs. 

1 .  Logical  output  for  detection:  Detection  =  true/false. 

2.  Logical  output  for  recognition:  Recognition  =  true/false;  False  Alarm  =  true/false. 
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6.0  Discussion  and  Assessment 


This  report  provides  a  brief  summary  of  the  research  and  models  of  Dr.  Lawrence  W.  Stark  as 
these  relate  to  searching  for  and  scanning  objects  in  the  visual  field.  Information  that  can  be  applied  to 
the  modeling  of  visual  search  for  personnel,  ground  vehicles,  and  helicopters  in  cluttered  terrain  has 
been  of  particular  interest  here.  The  following  comments  are  provided  to  assess  the  applicability  of 
Stark’s  theories  to  U.S.  Army  search  and  target  acquisition  (STA)  conditions  and  situations. 

1.  Concepts  and  terminology.  Stark's  distinction  between  searching  and  scanning  is  a  useful  one 
(Section  2.2).  Terms  such  as  search  patterns,  search  paths,  and  scanpaths  —  with  clear-cut 
definitions  of  the  terms  —  should  aid  significantly  in  modeling  the  serial  STA  processes. 

Usefulness  and  promise-.  Stark’s  terms  and  definitions  generally  should  be  useful  for 
Army  modeling  of  serial  STA  processes. 

Discrepancies  and  omissions-.  The  definitions  used  by  Stark  are  not  the  same  as  those 
used  by  many  other  target  acquisition  researchers  for  the  same  terms,  and  may  be  confusing 
unless  clearly  defined  each  time  used.  The  term  scanning,  m  particular,  usually  is  used  for 
moving  (slewing)  sensor  systems  over  the  field  of  view'  rather  than  for  inspecting  a  single 
target.  With  this  caveat,  the  terms  and  definitions  appear  to  be  complete  enough  for  serial 
models.  However,  '^parallel  STA  processes  are  modeled,  the  search  path  and  scanpath 
concepts  (which  are  mherently  linear  in  nature)  should  not  be  used,  to  minimize  confusion. 

2.  Defining  regions  of  interest  and  setting  thresholds.  Regions  of  interest  in  the  visual  scene 
possibly  could  be  used  to  reduce  computing  time  (Section  3.4).  A  detection  threshold,  defined  for 
each  region,  is  needed  to  specify  the  degree  of  similarity  between  template  and  target  necessary  for 
a  match. 

Usefulness  and  promise:  A  similar  approach  has  been  proposed  for  other  STA  models. 
It  is  appealing  in  its  apparent  simplicity. 

Discrepancies  and  omissions:  Defining  regions  of  interest  in  natural  scenes  must  be 
done  carefully,  to  avoid  biasing  predictions:  if  some  regions  are  “more  interesting”  than 
otliers,  criteria  that  set  objective  levels  of  “interest”  are  needed.  Defining  appropriate 
thresholds  for  what  constitutes  a  match  is  a  major  problem,  in  the  absence  of  applicable 
experimental  results.  While  the  concept  is  intriguing,  implementation  of  this  idea  is  very 
difficult.  Much  research  is  needed  to  define  criteria  for  what  constitutes  a  region  of  interest. 
Similarly,  studies  must  be  carried  out  to  determine  threshold  values  that  w'ill  result  in 
predictions  that  match  field  test  results. 

3.  Independence  of  glimpses  versus  Markov  processes.  Koopman’s  detection  theory  assumes  that 
each  glimpse  or  fixation  is  independent  of  the  others  (Section  3.1).  Stark  has  demonstrated  tliat 
such  independence  does  not  exist  in  a  structured,  cluttered  scene.  There  is  a  significant  cognitive 
component  to  the  search  task  that  apparently  links  the  location  of  one  glimpse  with  that  of  the  next. 

Usefulness  and  promise:  Use  of  mathematical  models  that  require  the  assumption  of 
independent  events  may  not  be  appropriate  for  representing  search  in  battlefield  terrain. 
Markov  models  (which  assume  that  the  next  event  depends  on  the  directly  previous  event) 
may  be  more  applicable,  as  Stark  suggests  (Section  4.3.1). 

Discrepancies  and  omissions:  The  Markov  matrix  technique  proposed  by  Stark 
assumes  that  two  sequences  of  eye  movements  are  available  for  comparison  (as  does  the 
string  editing  technique;  Section  4.3.2).  That  is,  an  observer’s  scanpath  and  resulting 
frequency  matrix  for  a  given  target  are  known.  The  observer’s  scanpath  and  matrix  for  an 
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unknown  object  then  are  determined,  and  the  matrices  are  compared  to  determine  whether 
they  represent  eye  movements  for  the  same  object.  At  least  two  problems  come  to  mind: 

(1)  how  closely  alike  must  the  two  matrices  be,  to  constitute  a  match  (the  threshold  problem 
again)  and  (2)  since  scanpaths  appear  to  be  unique  to  both  the  observer  and  the  object 
observed,  how  can  these  be  generalized  for  use  in  combat  modeling?  Significant 
experimental  research  would  be  needed  to  answer  these  questions. 

Serial  versus  parallel  processing.  Stark’s  model  is  based  on  the  concept  that  sequential  eye 
movements  drive  the  STA  process  (referred  to  as  the  serial  recognition  hypothesis;  see  Section 
4. 1).  The  eye  focuses  on  a  given  point  in  the  scene  and  processes  information  there,  then  moves  to 
a  different  point,  where  new  information  is  processed.  The  possibility  that  cognitive  processing  of 
information  from  the  first  point  continues  after  the  eye  has  moved  on  is  not  included  in  the  model, 
nor  is  the  possibility  that  no  processing  occurs  until  after  several  points  have  been  examined  and 
now  can  be  compared. 

Usefulness  and  promise:  For  practical  purposes,  a  serial  model  is  easy  to  visualize  and 
to  program.  For  now,  this  appears  to  be  a  reasonable  approach  in  order  to  obtain  workable 
computer  programs. 

Discrepancies  and  omissions:  For  the  long  run,  modeling  of  human  parallel  processing 
of  information  used  for  STA  should  be  considered.  The  times  required  for  detection, 
recognition,  and  identification  could  be  significantly  different  if  perceptions  about  individual 
points  were  not  reviewed  individually  but  instead  were  analyzed  concurrently. 

Different  processing  for  familiar  and  random  scenes.  Stark  differentiates  between  search 
patterns  (used  for  random  distributions  of  objects  in  a  scene)  and  search  paths  (used  for  visually 
covering  a  ^‘natural”  scene).  This  is  an  important  distinction  (Sections  3.2  and  3.3),  Only  the 
latter  has  real  bearing  on  military  STA,  yet  much  research  has  focused  on  finding  letters  or  other 
simple  objects  placed  randomly  in  non-natural,  artificially  structured  (or  plain)  backgrounds. 

Usefulness  and  promise:  Research  (by  Stark  and  others)  on  eye  movements  and  object 
location  in  ^"non-natural”  backgrounds  has  provided  useful  groundwork  for  STA  modeling. 
However,  data  from  such  studies  must  be  used  with  extreme  caution  when  extrapolating  to 
human  perception  of  objects  in  natural  scenes.  The  latter  situation  apparently  utilizes  on  a 
completely  different  search  process,  driven  by  the  scene  and  the  observer’s  expectations. 

Discrepancies  and  omissions:  Stark  cites  research  on  both  search  patterns  and  search 
paths.  However,  the  computer  model  discussed  in  this  report  imbeds  targets  and  decoys 
essentially  randomly  in  an  artificially-structured  background,  so  that  search  pattern  eye 
movements  are  modeled.  This  is  much  less  usefiil  for  Army  STA  modeling  than  eye 
movements  that  use  search  paths  over  real-world  scenes. 

Random,  scene-covering,  and  object-driven  eye  movements.  The  easiest  way  to  model  search 
is  either  as  a  random  series  of  eye  movements  (each  independent  of  the  previous)  or  as  a 
systematic  coverage  of  the  scene  from  top  to  bottom  or  side  to  side.  Research  cited  in  this  report 
indicates  that  neither  process  is  likely  for  humans  observing  real-world  scenes.  Eye  movements 
and  information  gathering  instead  are  functions  of  the  scene  and  types  of  targets  (Section  3.3). 

Usefulness  and  promise:  Stark’s  data  on  STA  processes  related  to  real  objects  in  real- 
world  scenes  possibly  will  be  very  useful  in  Army  modeling.  However,  modeling  techniques 
are  needed  that  will  permit  some  scene  generalization  (to  be  programmable,  every  possible 
type  of  scene  in  the  world  caimot  be  modeled  individually)  but  that  also  reflect  actual  human 
STA  performance. 
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Discrepancies  and  omissions.  Although  his  research  indicates  that  random  or 
systematic  scene  coverage  search  patterns  probably  are  unrealistic  for  natural  scenes  (except 
possibly  for  visually  covering  an  area  using  a  slewing  sensor),  Stark  uses  such  coverage  for 
the  computer  model  described  in  this  report  (Section  5.3).  This  simplification  may  have  been 
useful  to  get  a  working  program,  but  definitely  must  be  replaced  for  future  militaiy  STA 
modeling.  However,  first  some  determinations  must  be  made  of  how  soldiers  process 
information  about  various  military  targets  in  different  kinds  of  combat  areas,  and  how  direct 
vision  processing  differs  from  that  using  slewing  sensors.  Generalities  about  such  processing 
then  will  be  needed  to  make  programming  feasible.  For  example,  if  all  tanks  are  examined  in 
approximately  the  same  detail  and  sequence,  a  generic  “tank”  search  process  then  can  be 
used  when  a  tank  is  the  target. 

7.  Hierarchy  of  search  tasks.  Stark  proposes  that  visual  search  can  be  modeled  as  a  series  of 
searches  by  matched  filters  (possibly  templates  with  brighmess  characteristics  and/or  the  outline 
of  the  target;  Section  3.4).  A  coarse  filter  could  be  used  to  represent  the  detection  process,  then  a 
finer  one  for  recognition,  and  finally  a  quite  fine  one  for  identification.  This  approach  implies  that 
the  STA  process  from  detection  through  identification  is  a  continuum,  utilizing  the  same 
information  but  with  increasingly  fine  discrimination. 

Usefulness  and  promise.  This  approach  could  be  used,  but  only  for  a  very  gross  (and 
inaccurate)  representation  of  the  human  search  process. 

Discrepancies  and  omissions:  The  multiple  filter  approach  suffers  from  the  same 
problems  as  Acquirers  Johnson  line  criteria  approach  and  the  Oracle  fractional  perimeter 
approach.  All  three  assume  that  detection  and  identification  are  essentially  the  same 
cognitive  process  and  can  be  modeled  simply  by  modifying  a  single  parameter.  Combat 
models  will  fail  to  yield  real-world  results  as  long  as  such  a  simplistic  approach  to  human 
performance  is  included  in  STA  prediction  systems. 

8.  Straight  lines,  angles,  and  details.  Stark  suggests  that  scene  processing  which  removes  all  scene 
objects  except  vertical  and  horizontal  lines  might  represent  how  the  eye  tends  to  pick  such  lines  out 
of  the  background  (Section  3.4).  This  modeling  process  can  be  useful,  but  should  not  be  accepted 
in  isolation.  Target  angles  and  internal  patterns  also  appear  to  affect  discrimination  from  the 
background,  and  camouflage  can  fool  the  eye  so  that  straight  lines  may  not  be  seen  as  straight. 

Usefulness  and  promise:  Filtering  a  scene  to  remove  everything  except  straight  lines 
may  be  one  useful  step  in  modeling  perception.  This  technique  has  been  used  for  automatic 
target  detection  systems. 

Discrepancies  and  omissions:  This  process,  if  used  alone,  would  give  incorrect  results 
for  detection  probabilities  (and  certainly  for  recognition  and  identification).  If  such  a  model 
is  included,  additional  processes  are  required  that  consider  the  effect  on  STA  of  target  angles 
(as  noted  by  Stark,  storing  an  image  as  a  sequence  of  segment  lengths  and  connecting  angles 
can  be  quite  efficient).  The  effect  of  internal  detail  also  cannot  be  ignored,  especially  for 
prediction  of  the  probability  of  recognition  and  identification. 

9.  Modeling  of  clutter.  Stark  notes  that  clutter  in  the  target  area  cannot  be  modeled  simply  as 
“noise.”  Instead,  it  must  be  modeled  as  unwanted  signals  that  can  be  confused  with  the  targets 
(Section  3.4).  His  approach  in  the  computer  program  discussed  in  this  report  has  been  to  include 
random  background  noise,  but  also  to  include  objects  that  are  similar  to  the  targets  (decoys;  see 
Section  5.3). 

Usefulness  and  promise:  Inclusion  of  decoys  so  a  false  alarm  rate  can  be  modeled  in 
the  STA  process  is  useful,  especially  since  deception  is  commonly  used  on  the  battlefield. 
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Discrepancies  and  omissions:  Modeling  other  clutter  (besides  decoys)  as  random  noise 
is  an  oversimplification.  It  ignores  the  fact  that  some  object  shapes  and  luminances  are  more 
conspicuous  than  others  and  can  catch  the  attention  —  even  if  they  do  not  look  like  targets  — 
and  delay  detection  of  the  real  targets.  More  realistic  models  of  battlefield  clutter  are  needed, 

10.  Theoretical  model  of  visual  search.  The  components  of  Stark’s  theoretical  model  include  those 
generally  accepted  as  strongly  influencing  STA  behavior:  (1)  target,  background  and  environment, 
(2)  eye  movement  and  retinal  images,  (3)  human  search  behavior,  (4)  cognitive  processes,  and 

(5)  eye  and  brain  physiology  (Section  5.1). 

Usefulness  and  promise:  Stark’s  list  of  factors  to  be  included  in  the  model  can  serve  as 
a  useful  checklist,  to  ensure  that  none  of  these  critical  model  components  has  been  left  out. 

The  physics  of  the  search  scene  have  been  studied  in  detail  for  years,  and  modeling  should  be 
straightforward  (although  Stark  does  not  address  the  importance  of  having  clear  line  of  sight 
between  observer  and  target;  this  too  often  is  taken  for  granted  in  models).  Stark’s  own 
work  provides  some  of  the  data  needed  to  model  human  search  behavior.  The  effects  of  target 
size  and  contrast  and  of  image  eccentricity  on  the  retina  have  been  successfully  modeled  in 
other  systems. 

Discrepancies  and  omissions:  Much  more  difficult  to  implement  are  the  cognitive 
components  of  STA  models.  Most  of  the  relevant  STA  research  has  been  carried  out  by 
scientists  such  as  physiologists  and  physicists  who  study  questions  which  either  have 
deterministic  answers  or  which  can  be  modeled  using  known  probabilistic  distributions.  Very 
little  work  has  been  done  by  psychologists,  who  are  willing  to  study  uniquely  human  factors 
that  include  numerous  interacting  elements  —  and  that  often  yield  ''sometimes”  answers  best 
modeled  using  Zadah’s  fuz2y  set  and  fuzzy  logic  theory.  The  lack  of  required  hiunan 
performance  data  is  a  major  problem  for  implementing  Stark’s  theories  into  an  Army  STA 
combat  models. 

11.  Prototype  computer  version  of  visual  search  model.  Although  a  prototype  computer  model  has 
been  constructed  (presumably  based  on  Stark’s  theoretical  model  of  visual  search)  this  computer 
program  does  little  except  illustrate  how  many  approximations  are  needed  at  present  to  model 
human  search  behavior  (Section  5.2). 

Usefulness  and  promise:  While  the  theoretical  model  shows  promise,  the  available 
computerized  prototype  is  little  more  than  a  skeleton  on  which  a  model  might  be  hung  at  some 
later  time.  Operations  are  carried  out  in  a  logical  order,  and  it  might  be  possible  to  substitute 
real  subroutines  and  procedures  for  the  current  approximations.  But  for  now  there  is  little  of 
substance  in  the  program. 

Discrepancies  and  omissions:  The  program  is  simply  too  sketchy  at  present  to  warrant 
listing  all  of  its  problems. 
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Glossary 


This  list  provides  definitions  of  terms  as  they  are  used  by  Stark  and  in  this  report.  Although  some 

definitions  may  differ  from  those  used  by  other  researchers,  the  following  set  reflects  the  terminology 

and  meanings  commonly  used  by  Stark  and  his  coworkers. 

Detection:  the  process  of  determining  that  there  is  something  of  interest  in  the  scene  that  should  be 
inspected. 

Eye  movements:  brain-directed  movements  of  the  eyeball  from  one  fixation  point  to  the  next. 

Fovea:  the  1-  to  2-arc  degree  area  of  the  retina  located  on  the  visual  axis,  upon  which  fixated  objects 
are  imaged.  The  fovea  provides  color  vision  and  high-resolution  imaging  of  objects. 

Gaze  movement:  movement  of  the  eye  in  space,  including  both  head  movements  and  eye  movements, 
as  the  observer  inspects  the  field  of  regard. 

Identification:  the  process  of  determining  that  an  object  which  has  been  recognized  matches  its 
detailed  internal  representation  closely  enough  to  be  considered  identical  to  that  internal 
representation  (often  used  interchangeably  with  recognition  in  the  Stark  literature). 

Internal  representation:  components  or  features  of  a  previously-observed  object  that  are  stored  in  the 
brain's  memory  system,  then  matched  step  by  step  with  the  object  when  it  is  viewed  again  and  the 
observer  attempts  to  recognize  it. 

Location:  the  process  of  determining  where  a  specific  object  is  placed  in  the  scene,  with  respect  to 
other  objects  in  the  scene. 

Object  features:  the  components  of  an  image  that  hold  the  most  information  about  that  image, 
primarily  the  points  where  line  directions  change  most  abruptly. 

Recognition:  the  process  of  matching  the  image  of  an  object  with  its  gross  internal  representation  in 
the  memoiy  system  (often  used  interchangeably  with  identification  in  the  Stark  literature). 

Saccades:  eye  movements  occurring  at  about  3  per  second  but  occupying  only  about  10  per  cent  of 
total  viewing  time.  Vision  is  suppressed  during  saccades,  so  that  almost  all  visual  information  is 
collected  during  fixations. 

Scanning:  a  process  involving  active,  regular  eye  movements  over  an  area  or  target,  with  the  goal  of 
recognizing  and  possibly  identifying  what  is  looked  at.  The  observer  inspects  the  scene  or  object 
and  compares  its  features  with  those  of  stored  cognitive  models.  Scanning  usually  ends  when  the 
recognition  or  identification  process  is  complete. 

Scanpath:  the  repetitive  and  idiosyncratic  sequence  of  eye  movements  used  for  inspecting  and 

recognizing  particular  familiar  objects  during  normal  viewing  of  scenes,  people,  and  objects.  The 
scanpath  is  shaped  by  a  cognitive  model  of  the  guessed  object  and  the  organization  of  its 
subfeatures.  This  cognitive  model  controls  active  looking  to  result  in  efficient  acquisition  of  visual 
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information.  Scanpath  sequences  occupy  from  about  25  to  30  per  cent  of  the  observer’s  viewing 
time,  the  rest  consisting  of  less  regular  eye  movements. 

Scanpath  feature  ring:  a  serial  model  of  the  internal  representation  of  objects.  The  model  maintains 
that  representations  of  objects  are  composed  of  sensory  memory  traces  recording  object  features 
and  of  motor  memory  traces  of  the  eye  movements  from  one  feature  to  another.  As  an  observer 
views  an  object  for  the  first  time  and  becomes  familiar  with  it,  he  or  she  alternately  records  a 
feature  of  the  object  and  the  eye  movement  required  to  reach  the  next  feature.  The  memory  traces 
of  the  feature  ring  thus  are  laid  down,  as  both  sensory  and  motor  activities  are  recorded.  When  the 
object  is  next  encountered,  the  observer  recognizes  it  by  matching  it  with  the  feature  ring. 

Scanpath  theory:  a  theory  that  suggests  that  eye  movements  are  controlled  by  internal  cognitive 
models  already  present  in  the  brain,  and  that  predicts  similar  sequences  of  visual  fixations  for  a 
given  observer  looking  repeatedly  at  a  particular  image. 

Search  path:  an  efficient,  repetitive,  idiosyncratic  sequence  of  eye  movements  utilizing  only  a  small 
set  of  fixations  to  check  a  number  of  features  in  the  scene,  generally  the  targets.  This  sequence 
occurs  during  observation  of  ’’natural"  scenes  with  naturally-distributed  targets,  and  leads  to 
inspection  of  the  expected  locations  of  the  targets.  Search  paths  tend  to  be  unique  to  the  observed 
scene.  They  are  shaped  and  driven  by  internal  spatial  models  of  the  spatial  organization  of  target 
objects  in  a  3-D  "natural"  scene,  and  by  knowledge  accumulated  from  previous  experience. 

Search  pattern:  an  efficient,  repetitive,  idiosyncratic  sequence  of  eye  movements  carrying  the  eye 
systematically  to  cover  an  entire  2-D  scene.  This  sequence  occurs  during  observation  of  an 
apparently  random  distribution  of  objects  located  in  a  search  area,  when  the  observer  is  not 
familiar  with  the  area's  spatial  organization  (if  any).  Search  patterns  can  be  regular  (horizontal, 
vertical,  oscillating,  circular)  or  irregular  (essentially  random).  They  usually  are  modes  of 
covering  a  search  area  in  an  efficient  and  thorough  manner  when  no  information  is  available  to 
shape  the  search. 

Serial  recognition  hypothesis:  during  recognition,  the  features  of  an  observed  object  are  matched 
serially,  step  by  step,  with  the  features  of  internal  representation  that  may  match  that  object. 
Successful  matching  of  all  features  completes  recognition. 

Visual  lobe:  the  probability  of  acquiring  a  target,  plotted  against  the  off-axis  angle  of  the  target  image 
on  the  retina.  The  distribution  of  these  probability  angles  constitutes  a  soft-shell  visual  lobe. 

When  a  constant  probability  of  acquisition  is  assumed  over  several  degrees  (or  over  the  entire 
retina),  the  model  is  referred  to  as  a  hard  shell  (or  ‘'cookie  cutter”)  visual  lobe. 

Visual  periphery:  the  retinal  area  outside  tlie  fovea,  extending  out  to  die  limits  of  vision.  The 
periphery  provides  low-resolution  black  and  white  vision  and  is  especially  good  at  motion 
detection.  Scene  features  often  are  tentatively  located  in  the  periphery  of  the  retina  then  fixated 
directly  for  detailed  inspection. 

Visual  search:  general  term  for  a  process  involving  active  eye  movements  that  cover  or  encompass  a 
scene  or  the  field  of  regard,  with  the  goal  of  locating  (and  possibly  recognizing)  a  desired  object. 
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